Desiderata For Tagging With WordNet Synsets Or MCCA Categories

نویسنده

  • Kenneth C. Litkowski
چکیده

Minnesota Contextual Content Analysis (MCCA) is a technique for characterizing the concepts and themes occurring in text (sentences, paragraphs, interview transcripts, books). MCCA tags each word with a category and examines the distribution of categories against norms representing general usage of categories. MCCA also scores texts in terms of social contexts that are similar to different functions of language. Distributions can be analyzed using non-agglomerative clustering to characterize the concepts and themes. MCCA categories have been mapped to WordNet senses. The &fining characteristics that emerge from the mapping and the statistical techniques used in MCCA for analyzing concepts and themes suggest that tagging with WordNet synsets or MCCA categories may produce epiphenomenal results that are misleading. We suggest that WordNet synsets and MCCA categories be augmented with further lexical semantic information for use after text is tagged or categorized. We suggest that such information is useful not only for the primary purposes of disambiguation in parsing and text classification in content analysis and information retrieval, but also for tasks in corpus analysis, discourse analysis, and automatic text summarization.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sense-Tagging Chinese Corpus

Contextual information and the mapping from WordNet synsets to Cilin sense tags deal with word sense disambiguation. The average performance is 63.36% when small categories are used, and 1, 2 and 3 candidates are proposed for low, middle and high ambiguous words. The performance of tagging unknown words is 34.35%, which is much better than that of baseline mode. The sense tagger achieves the pe...

متن کامل

Automatic Construction of Persian ICT WordNet using Princeton WordNet

WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...

متن کامل

Semantify del.icio.us: automatically turn your tags into senses

At present tagging is experimenting a great diffusion as the most adopted way to collaboratively classify resources over the Web. In this paper, after a detailed analysis of the attempts made to improve the organization and structure of tagging systems as well as the usefulness of this kind of social data, we propose and evaluate the Tag Disambiguation Algorithm, mining del.icio.us data. It all...

متن کامل

WordNet Affect: an Affective Extension of WordNet

In this paper we present a linguistic resource for the lexical representation of affective knowledge. This resource (named WORDNETAFFECT) was developed starting from WORDNET, through a selection and tagging of a subset of synsets representing the affective

متن کامل

Towards A Bootstrapping Framework For Corpus Semantic Tagging

Availability of source information for semantic tagging (or disambignating) words in corpora is problematic. A framework to produce a semantically tagged corpus in a domain specific perspective using as source a general purpose taxonomy (i.e. WordNet) is here proposed. The tag set is derived from higher level Wordnet synsets. A methodology aiming to support semantic bootstrapping in a NLP appli...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997